DX12 Basics
A work-in-progress reference for some DX12 basics as I learn along. Check out Frank D. Luna’s books for an excellent and thorough introduction to the topic.
Conceptual Diagram
 
Executing Commands
An application submits commands to the GPU via a
CommandQueue. Execution is asynchronous. The GPU idles if
the queue is empty, and the CPU stalls on submission if the queue is
full. A good application keeps both busy.
Commands are recorded in CommandLists.
CommandLists are submitted to the CommandQueue
via ExecuteCommandLists(). CommandLists are
executed in order.
A CommandList must be Close()d before it
can be executed.
Once a CommandList has been executed, it can be
Reset() and re-used to record a new set of commands.
Reset() re-initializes the CommandList and is
cheaper than destroying it and creating a new one.
Commands in a CommandList are recorded into a
CommandAllocator. This is the memory backing of commands.
Therefore the CommandAllocator cannot be reset until the
GPU finishes executing the commands. This requires synchronization
(Fence).
Once the GPU has finished executing a CommandList, the
CommandList’s CommandAllocator can be
Reset() to record new commands.
Multiple CommandLists can be associated with the same
CommandAllocator. However, only one of them can record at
the same time, and the others must be in a closed state. In essence,
commands are allocated contiguously in the CommandAllocator
while a CommandList is recording.
When a CommandList is created or Reset(),
it defaults to an open state. It might be convenient to
Close() it right away.
Resources and Descriptors/Views
DX12 (and Vulkan) decouples resources and descriptors. Descriptors are also known as views.
A Resource is the texture or buffer data in memory.
A Descriptor describes how the Resource is
accessed in different stages of the graphics pipeline. For example, a
render target view (RTV) draws into a texture. A shader
resource view (SRV) allows a shader to read from a texture.
A Descriptor can also map to a subregion of the
Resource and reinterpret the type of the data elements (for
typeless resources).
If a Resource is typeless, then the
Descriptor must specify a type. Typed
Resources are best for performance; use typeless
Resources only when strictly necessary.
Descriptor creation incurs some validation overhead.
Create them during initialization if possible.
Types of Descriptors
- CBV: constant buffer view, for reading constant buffer data.
- SRV: shader resource view, for reading textures.
- UAV: unordered access view, to read/write texture and buffer data.
- Sampler: to sample textures via their- SRVs.
- RTV: render target view, to render into textures.
- DSV: depth/stencil view, to describe depth/stencil buffers.
Descriptor Heaps
Descriptors are allocated from a
DescriptorHeap. A DescriptorHeap is the memory
backing for a type of Descriptor. An application will need
at least one DescriptorHeap for each type of
Descriptor used. Multiple DescriptorHeaps of
the same type can also exist.
Resource Heaps
Resources are also allocated in heaps. When creating a resource
(CreateCommittedResource()), we must specify the desired
heap type:
- Default heap: for resources exclusively accessed by the GPU.
- Upload heap: for resources that require data uploads from the CPU to the GPU.
- Readback heap: for resources that need to be read back by the CPU.
Synchronization
CPU-GPU Synchronization
 
Fences are used for CPU-GPU synchronization.
The example above shows how to safely Reset() a
CommandAllocator by making sure that all commands in the
CommandQueue backed by the CommandAllocator
have been executed on the GPU.
To establish a synchronization point, the CPU calls
CommandQueue::Signal() on a Fence and with a
given fence value. When the GPU reaches the synchronization point, it
signals the CPU by setting the Fence to the given value.
Typically this value can be incremented by one every time a new
synchronization point is established.
The CPU can check the Fence value in two ways. One is to
call Fence::GetCompletedValue(), which is non-blocking. The
other way is blocking: create a Windows event object, call
SetEventOnCompletion(), then
WaitForSingleObject(); the calling thread is put to sleep
until the GPU signals the Fence.
GPU Workload Synchronization
Unlike OpenGL and previous versions of DirectX, applications also
need to manage GPU workload synchronization. For example, if shader A
writes to a texture through an RTV or UAV and
shader B reads from it through an SRV or UAV,
then a synchronization point must be established to prevent a resource
hazard.
CommandList::ResourceBarrier() establishes a
synchronization point between GPU workloads. Two common types of
barriers are:
- Resource Transition Barrier: declares a transition in a resource’s usage.
- UAV Barrier: declares that all current UAV accesses to a resource must complete before future accesses can begin.
Resources are associated with a usage or state
that defines how a resource is used. A
Resource Transition Barrier declares a change in a
resource’s state. The GPU then inserts synchronization points when it
encounters barriers to prevent resource hazards.
For example, when beginning a new frame, the previous frame’s front
buffer becomes the current frame’s back buffer. Before we can render to
this resource in the current frame, we must transition it from
D3D12_RESOURCE_STATE_PRESENT to
D3D12_RESOURCE_STATE_RENDER_TARGET. Then, once we have
rendered the current frame and are ready to Present(), we
perform another transition from
D3D12_RESOURCE_STATE_RENDER_TARGET back to
D3D12_RESOURCE_STATE_PRESENT.
A UAV Barrier, on the other hand, synchronizes access to
a UAV. This is typically needed when shader A writes to a
UAV that shader B then reads from, or when two shaders
write to a given UAV. In the first case, A must finish work
before B can start executing. In the latter, a barrier is needed to
guarantee write order unless we can gurantee that the shaders write to
different parts of the UAV.
Uploading Buffer Data
 
Buffers should be placed in the default heap for best performance.
However, resources on the default heap are not CPU-writeable. To upload
buffer data, the application must instead create a buffer on the upload
heap (“upload buffer”), upload data to that buffer, and then copy the
upload buffer into the original buffer with CopyResource()
or CopyBufferRegion().
The upload buffer cannot be released or re-used until the GPU
finishes executing the CopyResource() or
CopyBufferRegion() command. This requires CPU-GPU
synchronization as usual.
The target buffer must also undergo the appropriate resource transitions during the transfer.
Diagrams rendered with PlantUML.
